Skip to content

Fix TestScheduleStop: replace cron.FuncJob injection with real *Job lifecycle#1955

Merged
vcastellm merged 2 commits intomainfrom
copilot/fix-ci-failing-job-69059842947
Mar 29, 2026
Merged

Fix TestScheduleStop: replace cron.FuncJob injection with real *Job lifecycle#1955
vcastellm merged 2 commits intomainfrom
copilot/fix-ci-failing-job-69059842947

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 29, 2026

TestScheduleStop was manually injecting a cron.FuncJob via sched.Cron.AddFunc() and directly setting sched.started = true. When sched.Start() was called, ClearCron() failed the *Job type assertion on the FuncJob, emitting an error log that caused CI to exit 1.

Proposed changes

  • Replace internal state mutation (sched.Cron = ..., sched.started = true, sched.Cron.AddFunc(...)) with a proper sched.Start([]*Job{testJob1}, &Agent{}) call to establish the started state through the public API.
  • Use @every 10m schedule instead of @every 2s to prevent the job from actually firing during the test — avoids a nil job.logger panic that occurs when Job.Run() executes before GetEntryJob() has set the logger.
  • Remove unused imports: fmt, github.com/distribworks/dkron/v4/extcron, github.com/robfig/cron/v3.

The test still validates the full scheduler lifecycle:

  • First Start() succeeds
  • Second Start() returns an error (already started)
  • After <-sched.Stop().Done(), Start() succeeds again
  • Final Stop() leaves the scheduler stopped

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
Original prompt

Implement a fix for CI failing job 69059842947 in repository dkron-io/dkron.

Context

The failure is caused by test TestScheduleStop in dkron/scheduler_test.go injecting a cron.FuncJob using sched.Cron.AddFunc(...) and manually setting scheduler internal state. During sched.Start(...), ClearCron() in dkron/scheduler.go iterates cron entries and expects *Job, so it logs:

  • scheduler: Failed to cast job to *Job found type cron.FuncJob and removing it

The CI job exits with code 1 after these error logs.

Required Changes

  1. Update dkron/scheduler_test.go TestScheduleStop to avoid adding raw cron.FuncJob entries.
  2. Stop directly mutating scheduler internals (sched.Cron = ..., sched.started = true) in this test.
  3. Make the test start the scheduler through sched.Start(...) with a real *Job, then verify a second Start(...) call errors when already started.
  4. Keep the existing intent of the test:
    • Start returns error if scheduler already started.
    • After Stop() completes, Start succeeds again.
    • Scheduler ends stopped.
  5. Remove now-unused imports from dkron/scheduler_test.go (e.g., fmt, extcron, cron) if no longer needed.

Acceptance Criteria

  • No error log Failed to cast job to *Job found type cron.FuncJob is emitted by TestScheduleStop.
  • go test for package dkron passes.
  • The modified test remains deterministic and validates scheduler lifecycle behavior through public methods.

Suggested Implementation Outline

In TestScheduleStop:

  • Create sched := NewScheduler(log).
  • Build testJob1 as existing job fixture.
  • require.NoError(t, sched.Start([]*Job{testJob1}, &Agent{})).
  • assert.Error(t, sched.Start([]*Job{testJob1}, &Agent{})).
  • Wait briefly and stop (<-sched.Stop().Done()).
  • assert.NoError(t, sched.Start([]*Job{testJob1}, &Agent{})).
  • Final sched.Stop() and assert.False(t, sched.Started()).

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Please find a solution for failing job 69059842947. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.

Assistant: Let me fetch the job logs to investigate the failure.

Assistant: The logs mention failing tests but don't show the specific test failure output clearly. Let me fetch the test file and scheduler file to identify the root cause of the scheduler: Failed to cast job to *Job found type cron.FuncJob error, which appears to be the key issue.

Assistant: ## Root Cause

The key error from the logs is:

time="2026-03-29T10:16:13Z" level=error msg="scheduler: Failed to cast job to *Job found type cron.FuncJob and removing it"

This is triggered in ClearCron() by the TestScheduleStop test. The test deliberately adds a raw cron.FuncJob (via sched.Cron.AddFunc(...)) before calling sched.Start(), which internally calls ClearCron(). When ClearCron() iterates over entries and encounters the FuncJob, the type assertion to *Job fails — logging the error and removing the entry.

This is expected behavior in production, but the test is causing the error log to be emitted during the test run, which causes the CI job to fail (exit code 1) due to error output being treated as a test failure.

Solution

The TestScheduleStop test should avoid adding a raw cron.FuncJob directly to the cron instance, or should restructure the test to not trigger the error path in ClearCron. The intent of the test appears to be verifying that Start fails when the scheduler is already started — the AddFunc is only used to simulate a running cron.

Replace the raw AddFunc + manual state manipulation with a proper first Start() call so the scheduler is started cleanly:

func TestScheduleStop(t *testing.T) {
	log := getTestLogger()
	sched := NewScheduler(log)

	testJob1 := &Job{
		Name:           "cron_job",
		Schedule:       "@every 2s",
		Executor:       "shell",
		ExecutorConfig: map[string]string{"command": "echo 'test1'", "shell": "true"},
		Owner:          "John Dough",
		OwnerEmail:     "foo@bar.com",
	}

	// Start the scheduler with a real job
	err := sched.Start([]*Job{testJob1}, &Agent{})
	require.NoError(t, err)

	// Starting again while already started should return an error
	err = sched.Start([]*Job{testJob1}, &Agent{})
	assert.Error(t, err)

	// Wait for the job to start
	time.Sleep(time.Second * 2)
	<-sched.Stop().Done()

	err = sched.Start([]*Job{testJob1}, &Agent{})
	assert.NoError(t, err)

	sched.Stop()
	assert.False(t, sched.Started())
}

Why this fixes it

Before After
sched.Cron.AddFunc(...) injects a cron.FuncJob Only *Job instances are added via sched.Start()
ClearCron() hits the non-*Job type cast, logs an error ClearCron() only sees *Job entries, no error logged
CI fails due to error log out...

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix CI failure in job 69059842947 due to TestScheduleStop Fix TestScheduleStop: replace cron.FuncJob injection with real *Job lifecycle Mar 29, 2026
Copilot AI requested a review from vcastellm March 29, 2026 10:31
@vcastellm vcastellm marked this pull request as ready for review March 29, 2026 10:57
@vcastellm vcastellm merged commit 26912ac into main Mar 29, 2026
2 checks passed
@vcastellm vcastellm deleted the copilot/fix-ci-failing-job-69059842947 branch March 29, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants